Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize PipelineConfiguration-checking ClusterStateListeners #117038

Merged
merged 14 commits into from
Nov 19, 2024

Conversation

joegallo
Copy link
Contributor

@joegallo joegallo commented Nov 19, 2024

Before (this PR and the various semi-associated PRs that have already been merged):

Screenshot 2024-11-19 at 10 47 11 AM

After (this PR and the various semi-associated PRs that have already been merged):

Screenshot 2024-11-19 at 10 47 28 AM

I ran into this in the /_nodes/hot_threads output of a real in-the-wild cluster and it's been on my list ever since.

Two of our ClusterStateListeners are operating on PipelineConfiguration: the GeoIpDownloaderTaskExecutor and the IndexTemplateRegistry. Both of them ask the PipelineConfiguration for details that it doesn't actually have on hand, so we need to parse the XContent of the pipeline in order to retrieve those details. As a consequence, we're parsing the XContent of some of these pipelines twice (once per listener) on every cluster state update. In the case of GeoIpDownloaderTaskExecutor we have to do that for every pipeline that there is, so we're adding a cost to the master that scales on the order of the number of pipelines, which means that for clusters with large numbers of pipelines the actual CPU cost expended can start to matter (computers are very fast, of course, but things like this can add up). This doesn't change the big-O of that part of the work (since we're still asking questions of every pipeline) but it does change the code so that we can answer those questions with objects that are hanging around in memory rather than needing to parse yaml-or-json to get them.

In terms of implementation, PipelineConfiguration holds the core of the changes. Rather than keeping the unparsed XContent around, it now maintains an unmodifiable parsed version of the same data. Other parts of ingest (most notably, creating a Pipeline from a PipelineConfiguration) rely on having a mutable copy of the PipelineConfiguration's config, so there's a getter that accepts a boolean which can be used to ask for that.


Related to #116988, #115355, #115348, and #115347 -- those were small 'jab' optimization PRs laying the groundwork for this final one. Similarly see also #115425, #115423, and #115421 -- the cleanups in those PRs were mostly intended to make this PR more readable and simpler in the end.

Closes #97382

@joegallo joegallo requested a review from a team as a code owner November 19, 2024 16:02
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.0.0 labels Nov 19, 2024
@joegallo joegallo force-pushed the optimize-pipeline-configuration-checks branch from ab574b1 to b41f212 Compare November 19, 2024 16:06
@joegallo joegallo added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring Team:Data Management Meta label for data/management team auto-backport Automatically create backport pull requests when merged v8.17.0 and removed needs:triage Requires assignment of a team area label labels Nov 19, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my end :) seems this should serialize a little more compact than JSON as well and removes the weird tracking of the content type :)

if (in.getTransportVersion().onOrAfter(TransportVersions.INGEST_PIPELINE_CONFIGURATION_AS_MAP)) {
config = in.readGenericMap();
} else {
final BytesReference bytes = in.readBytesReference();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Not too important since it's BwC and these things aren't gigantic but in the spirit of doing this consistently, you could use readSlicedBytesReference actually here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, okay, 8467326.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the spirit of doing this consistently, you could use readSlicedBytesReference actually here.

@original-brownbear can you explain more about this? readSlicedBytesReference() just calls readBytesReference(), so is it just for naming consistency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That delegation is only the default behavior. For the real world network buffers we have here we have overriding implementations that just slice the underlying buffer without copying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks for the explanation!

copy.add(innerDeepCopy(itemValue, unmodifiable));
}
return unmodifiable ? Collections.unmodifiableList(copy) : copy;
} else if (value == null || value instanceof String || value instanceof Number || value instanceof Boolean) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just put this condition in the assertion since it only has an effect with assertions on anyway, otherwise it's the same as the else branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence about that one, I'm happy to take your comment as a tiebreaker in the opposite direction, 659ad01.

Copy link
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joegallo joegallo merged commit 123b103 into elastic:main Nov 19, 2024
17 checks passed
@joegallo joegallo deleted the optimize-pipeline-configuration-checks branch November 19, 2024 21:42
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 117038

@nielsbauman
Copy link
Contributor

This is related to #97382. As @joegallo mentioned, this doesn't change the big-O of the cluster state appliers, but it probably helps a lot - seeing as the hot threads in that issue all include the (now improved) getConfigAsMap.

@joegallo what do you think, should we close that issue since this change probably helped a lot, or do we keep it open because the big-O of those appliers hasn't changed? I'm inclined to go with the former. If anyone runs into this again, we can always open the issue again.

@joegallo
Copy link
Contributor Author

joegallo commented Nov 20, 2024

Yeahhh, that has to do with the optimization of the cluster state appliers rather than the cluster state listeners, but indeed my changes would very much have improved them, too, or so I'd think. I'm going to spend half an hour microbenchmarking that to see if the difference is measurable. Assuming it is, I'll update the description of this PR and close that issue. Very nice catch, @nielsbauman!

@joegallo
Copy link
Contributor Author

Adding 2000 ingest pipelines on a three node 8.15.4 cluster took 31.6 minutes, doing the same on 9.0.0 with this PR merged took 18.05 minutes, so that's a noticeable improvement.

Here's the CPU utilization for both of those clusters while the pipelines were being added, though, which really drives home how much less work the 9.0.0 version was doing (and note that you can see my 30 minutes versus 18 minutes claim in the charts, too):

Screenshot 2024-11-20 at 12 50 30 PM Screenshot 2024-11-20 at 12 50 39 PM

@joegallo
Copy link
Contributor Author

Based on the above, I do think it's fair to close #97382 -- indeed it doesn't change the big-O runtime there (we're still checking all the pipelines), but it does mean that we're not parsing the x-content of the pipelines during that loop, so it's probably faster-enough to be considered 'solved'.

@nielsbauman
Copy link
Contributor

That's a speed-up if I've ever seen one! Definitely worth a bi-weekly highlight at the very least.

rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Nov 20, 2024
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Nov 29, 2024
We don't need to pass around the REST request body verbatim when
processing a put-pipeline request, parsing it repeatedly as it's needed.
Instead we can parse the body once when starting to process the request
on the master, and pass the parsed `Map` around instead.

Relates elastic#117038
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring Team:Data Management Meta label for data/management team v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Ingest Pipelines comparison when applying cluster state updates
6 participants